Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Efficient block-based sampling algorithm for aggregation query processing on duplicate charged records
PAN Mingyu, ZHANG Lu, LONG Guobiao, LI Xianglong, MA Dongxue, XU Liang
Journal of Computer Applications    2018, 38 (6): 1596-1600.   DOI: 10.11772/j.issn.1001-9081.2017112632
Abstract377)      PDF (982KB)(310)       Save
The existing query analysis methods usually treat the entity resolution as an offline preprocessing process to clean the whole data set. However, with the continuous increasing of data size, such offline cleaning mode with high computing complexity has been difficult to meet the needs of real-time analysis in most applications. In order to solve the problem of aggregation query on duplicate charged records, a new method integrating entity resolution with approximate aggregation query processing was proposed. Firstly, a block-based sampling strategy was adopted to collect samples. Then, an entity recognition method was used to identify the duplicate entities on the sampled samples. Finally, the unbiased estimation of aggregated results was reconstructed according to the results of entity recognition. The proposed method avoids the time cost of identifying all entities, and returns the query results that satisfy user needs by identifying only a small number of sample data. The experimental results on both real dataset and synthetic dataset demonstrate the efficiency and reliability of the proposed method.
Reference | Related Articles | Metrics